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Abstract 

This  paper  reports  on  user  interface  design  and 
evaluation  for  a  mobile,  outdoor,  augmented  reality 
(AR)  application.  This  novel  system,  called  the  Battle¬ 
field  Augmented  Reality  System  (BARS),  supports  in¬ 
formation  presentation  and  entry  for  situation  aware¬ 
ness  in  an  urban  war  fighting  setting.  To  our  knowl¬ 
edge,  this  is  the  first  time  extensive  use  of  usability 
engineering  has  been  systematically  applied  to  devel¬ 
opment  of  a  real-world  AR  system . 

Our  BARS  team  has  applied  a  cost-effective  pro¬ 
gression  of  usability  engineering  activities  from  the 
very  beginning  of  BARS  development.  We  discuss  how 
we  first  applied  cycles  of  structured  expert  evaluations 
to  BARS  user  interface  development,  employing  user 
interface  mockups  representing  occluded  (non-visible) 
objects.  Then  we  discuss  how  results  of  these  evalua¬ 
tions  informed  our  subsequent  user-based  statistical 
evaluations  and  formative  evaluations,  and  present 
these  evaluations  and  their  outcomes.  Finally,  we  dis¬ 
cuss  how  and  why  this  sequence  of  types  of  evaluation 
is  cost-effective. 

1.  Introduction 

For  more  than  two  decades,  through  our  work  in 
human-computer  interaction  and  usability  engineering, 
we  have  pursued  the  goals  of  developing,  applying,  and 
extending  methods  for  improving  the  usability  of  inter¬ 
active  software  applications.  In  particular,  our  work 
has  focused  on  high-impact,  cost-effective  techniques 
for  evaluating  usability  of  interactive  systems.  By 
“high-impact”  and  "cost-effective”,  we  mean  that  we 
have  as  a  goal  the  development  of  methodological 
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techniques  that  reduce  the  total  life  cycle  cost  of  an 
interactive  software  application. 

Usability  engineering  produces  highly  usable  user 
interfaces  that  are  essential  to  improved  user  experi¬ 
ences  and  productivity,  as  well  as  reduced  user  errors. 
Unfortunately,  managers  and  developers  often  have  the 
misconception  that  usability  engineering  activities  add 
costs  to  a  product’s  development  life  cycle.  In  fact, 
usability  engineering  can  reduce  development  costs 
over  the  life  of  the  product,  by,  for  example,  decreasing 
the  need  to  add  missed  functionality  later  in  the  devel¬ 
opment  cycle  when  such  additions  are  much  more  ex¬ 
pensive.  The  process  is  an  integral  part  of  interactive 
software  development,  just  as  are  systems  engineering 
and  software  engineering.  Usability  engineering  activi¬ 
ties  can  be  tailored  to  allow  individualization  as  needed 
for  a  specific  project  or  product  development  effort. 

The  usability  engineering  process  applies  to  any 
interactive  system,  ranging  from  training  applications 
to  multimedia  CD-ROMs  to  augmented  and  virtual 
environments  to  simulation  applications  to  graphical 
user  interfaces  (GUIs).  The  usability  engineering 
process  is  flexible  enough  to  be  applied  at  any  stage  of 
the  development  life  cycle,  and  its  various  activities  are 
generalizable  and  adaptable  across  development  of  all 
interactive  systems.  However,  just  like  good  software 
engineering  practices  [1],  early  use  of  the  process  pro¬ 
vides  the  best  opportunity  for  cost  savings. 

In  this  paper,  we  discuss  user  interface  design  and 
evaluation  for  a  mobile,  outdoor,  augmented  reality 
(AR)  application.  This  novel  system,  called  the  Battle¬ 
field  Augmented  Reality  System  (BARS),  supports 
information  presentation  and  entry  for  situation  aware¬ 
ness  when  conducting  urban  military  operations.  We 
have  systematically  incorporated  a  cost-effective  pro¬ 
gression  of  usability  engineering  activities  from  the 
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very  beginning  of  BARS  development.  Thus,  this  pa¬ 
per  focuses  on  the  specific  process  by  which  we  evalu¬ 
ated  the  BARS  product.  Results  of  our  usability  engi¬ 
neering  process  as  applied  to  numerous  other  products 
can  be  found,  for  example,  in  [4,  5,  7,  9,  10]. 

To  our  knowledge,  this  is  the  first  time  usability 
engineering  has  been  extensively  and  systematically 
applied  to  the  research  and  development  process  of  a 
real-world  AR  system.  In  fact,  a  comprehensive  litera¬ 
ture  review  of  880  papers  from  the  leading  augmented 
reality/virtual  reality  conferences  and  publication 
sources  showed  25  papers  (less  than  3%)  that  had  any 
human-computer  interaction  discussion,  and  of  those, 
only  14  (about  1.5%)  reported  a  user-based  study  [15]. 

2.  What  is  usability  engineering? 

Usability  engineering  is  a  cost-effective,  user- 
centered  process  that  ensures  a  high  level  of  effective¬ 
ness,  efficiency,  and  safety  in  complex  interactive  sys¬ 
tems  [6].  Figure  1  shows  a  simple  diagram  of  major 
usability  engineering  activities,  which  include  domain 
analysis,  quantifiable  user-centered  requirements  and 
metrics,  conceptual  and  detailed  user  interface  design, 
rapid  prototyping,  and  various  kinds  of  usability 
evaluations  of  the  user  interface.  Usability  engineering 
includes  both  design  and  evaluations  with  users;  it  is 
not  typically  extensive  hypothesis-testing-based  ex¬ 
perimentation,  but  instead  is  structured,  iterative  user- 
centered  design  and  evaluation  applied  during  all 
phases  of  the  interactive  system  development  life  cy¬ 
cle.  Most  extant  usability  engineering  methods  widely 
in  use  were  spawned  by  the  development  of  traditional 
desktop  graphical  user  interfaces  (GUIs). 


Figure  1.  Typical  activities  performed  during  the 
usability  engineering  process.  Although  the  usual 
flow  is  left-to-right  from  activity  to  activity,  outward¬ 
pointing  arrows  indicate  the  substantial  feedback  and 
iterations  that  occurs  in  practice. 

Since  the  focus  of  this  paper  is  the  usability  engi¬ 
neering  activity  of  usability  evaluation,  in  the  follow¬ 
ing  sections  we  briefly  explain  several  types  of  usabil¬ 


ity  evaluation.  These  include  expert  evaluation  (also 
sometimes  called  heuristic  evaluation  or  usability  in¬ 
spection),  user-based  statistical  evaluation,  formative 
evaluation,  and  summative  evaluation.  These  introduc¬ 
tory  and  brief  explanations  are,  of  necessity,  rather 
abstract,  to  introduce  each  type  of  evaluation.  In  Sec¬ 
tion  4,  we  present,  very  concretely,  how  we  applied  the 
first  three  of  these  types  of  evaluations  to  BARS  devel¬ 
opment.  These  types  of  evaluations  are  applicable  to 
the  user  interface  of  essentially  any  interactive  software 
application. 

2.1.  Expert  usability  evaluation 

The  process  of  identifying  potential  usability  prob¬ 
lems  by  comparing  a  user  interface  design  to  estab¬ 
lished  usability  design  guidelines  is  called  expert  us¬ 
ability  evaluation  (or  heuristic  evaluation  or  usability 
inspection).  Those  identified  problems  are  then  used  to 
derive  recommendations  for  improving  that  design. 
This  method  is  used  by  usability  experts  to  identify 
critical  usability  issues  early  in  the  development  cycle, 
so  that  these  design  issues  can  be  addressed  as  part  of 
the  iterative  design  process  [11].  Often  the  usability 
experts  rely  explicitly  and  solely  on  established  usabil¬ 
ity  design  guidelines  to  determine  whether  a  user  inter¬ 
face  design  effectively  and  efficiently  supports  user 
task  performance  (i.e.,  has  high  usability). 

Usability  experts  may  also  rely  more  implicitly  on 
design  guidelines  while  they  work  through  user  task 
scenarios  (typically  created  during  domain  analysis, 
another  usability  engineering  activity  in  Figure  1)  dur¬ 
ing  their  evaluation.  Each  evaluator  first  inspects  the 
design  alone,  independently  of  other  evaluators’  find¬ 
ings.  All  evaluators  then  combine  their  data  to  analyze 
both  common  and  conflicting  usability  findings.  Niel¬ 
sen  [11]  recommends  that  three  to  five  evaluators  per¬ 
forming  an  expert  evaluation  will  find  a  majority  of  the 
most  severe  usability  problems.  He  has  also  shown 
empirically  that  fewer  evaluators  generally  identify 
only  a  small  subset  of  problems  and  that  more  evalua¬ 
tors  produce  diminishing  results  at  higher  costs.  Re¬ 
sults  from  an  expert  evaluation  should  not  only  identify 
problematic  user  interface  components  and  interaction 
techniques,  but  should  also  indicate  why  a  particular 
component  or  technique  is  problematic.  Results  of  this 
type  of  evaluation  typically  are  not  applicable  across  a 
variety  of  different  application,  since  the  purpose  of  the 
evaluation  is  to  assess  specific  components  or  tech¬ 
niques  for  a  specific  application.  This  is  arguably  the 
most  cost-effective  type  of  usability  evaluation,  be¬ 
cause  it  does  not  involve  users. 
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2.2.  User-based  statistical  evaluation 

The  process  of  performing  relatively  small  and 
quick  empirical  studies  to  determine  what  specific  de¬ 
sign  factors  are  most  likely  to  affect  user  task  perform¬ 
ance  we  call  user-based  statistical  evaluation.  This 
can  be  especially  effective  when  designing  a  user  inter¬ 
face  to  support  new  and  novel  hardware,  domains,  and 
user  tasks.  Such  evaluations  typically  focus  on  lower- 
level  cognitive  or  perceptual  tasks,  where  the  impor¬ 
tance  of  these  tasks  would  be  suggested  by  earlier  ac¬ 
tivities  in  the  usability  engineering  process.  These 
studies  are  usually  targeted  at  a  specific  part  (e.g.,  a 
component  or  feature)  of  a  user  interface  design,  as 
opposed  to  the  user  interface  as  a  whole.  They  may 
involve  tasks  that  are  atomic  components  of  higher- 
level  representative  user  tasks,  and  the  tasks  are  often 
generic  rather  than  application-specific.  These  evalua¬ 
tions  are  very  similar  to  traditional  human  factors  ex¬ 
periments  and  are  guided  by  a  well-crafted  experimen¬ 
tal  design  to  assess  user  performance  by  varying  design 
factors.  Users  perform  tasks  that  are  narrowly  focused 
and  carefully  designed  to  study  a  specific  user  interface 
component  or  feature. 

Such  evaluations  help  refine  various  user  interface 
components  or  features,  in  preparation  for  more  com¬ 
prehensive  and  application-specific  formative  evalua¬ 
tions.  Our  experiences  indicate  that  the  components 
designed  and  refined  through  quick  and  iterative  user- 
based  statistical  evaluations  produce  mature  user  inter¬ 
face  components  and  features  that  are  well-suited  to 
support  overall  application  tasks  and  user  task  flow. 
Results  of  this  type  of  evaluation  typically  are  not  ap¬ 
plicable  across  a  variety  of  different  applications,  since 
the  purpose  of  the  evaluation  is  to  refine  components  or 
features  for  a  specific  application. 

2.3.  Formative  usability  evaluation 

The  process  of  assessing,  refining,  and  improving 
a  user  interface  design  by  having  representative  users 
perform  task-based  scenarios,  observing  their  perform¬ 
ance,  and  collecting  data  to  empirically  identify  usabil¬ 
ity  problems  [6]  is  called  formative  usability  evalua¬ 
tion.  This  observational  evaluation  method  can  ensure 
usability  of  interactive  systems  by  including  users  early 
and  continually  throughout  user  interface  development. 
The  method  relies  heavily  on  usage  context  (e.g.,  user 
tasks,  user  environment,  user  profiles),  as  well  as  a 
solid  understanding  of  human-computer  interaction. 
The  term  formative  evaluation  was  coined  by  Scriven 
[13]  to  define  a  type  of  evaluation  that  is  applied  dur¬ 
ing  evolving  or  formative  stages  of  design.  Scriven 
used  this  in  the  educational  domain  for  instructional 


design.  Williges  [16]  and  Hix  and  Hartson  [6]  ex¬ 
tended  and  refined  the  concept  of  formative  evaluation 
for  the  human-computer  interaction  and  usability  engi¬ 
neering  domain. 

A  typical  cycle  of  formative  evaluation  begins 
with  creation  of  user  scenarios  based  on  domain  analy¬ 
sis  activities.  These  scenarios  are  specifically  designed 
to  explore  and  evaluate  user  tasks,  information,  and 
work  flows.  Representative  users  perform  these  tasks 
as  evaluators  collect  both  qualitative  and  quantitative 
data.  Qualitative  data  include  critical  incidents  [3],  a 
user  event  that  has  a  significant  impact,  either  positive 
or  negative,  on  users’  task  performance  and/or  satisfac¬ 
tion.  Quantitative  data  include  metrics  such  as  how 
long  it  takes  a  user  to  perform  a  specific  task,  the  num¬ 
ber  of  errors  a  user  makes  during  task  performance, 
measures  of  user  satisfaction,  and  so  on.  Collected 
quantitative  data  are  then  compared  to  appropriate 
baseline  metrics,  sometimes  redefining  or  altering 
evaluators’  perceptions  of  what  should  be  considered 
baseline.  Both  qualitative  and  quantitative  data  are 
equally  important  since  they  each  provide  unique  in¬ 
sight  into  a  user  interface  design's  strengths  and  weak¬ 
nesses.  Finally,  evaluators  analyze  these  data  to  iden¬ 
tify  user  interface  components  or  features  that  both 
support  and  detract  from  user  task  performance,  and  to 
suggest  and  prioritize  user  interface  design  changes. 
As  with  the  first  two  types  of  evaluations,  results  of 
this  type  of  evaluation  typically  are  not  applicable 
across  a  variety  of  different  applications,  since  forma¬ 
tive  evaluation  is  designed  to  assess  a  specific  applica¬ 
tion. 

2.4.  Summative  usability  evaluation 

The  process  of  statistically  comparing  several  dif¬ 
ferent  systems  or  candidate  designs,  for  example,  to 
determine  which  one  is  “better,”  where  better  is  de¬ 
fined  in  advance,  is  called  summative  evaluation.  In 
contrast  to  formative  evaluation,  it  is  typically  per¬ 
formed  after  a  product  or  some  part  of  its  design  is 
more  or  less  complete.  In  practice,  summative  evalua¬ 
tion  can  take  many  forms.  The  most  common  are  the 
comparative  field  trial,  and  more  recently,  the  expert 
review  [14],  While  both  the  field  trial  and  expert  re¬ 
view  methods  are  well-suited  for  design  assessment, 
they  typically  involve  assessment  of  single  prototypes 
or  field-delivered  designs.  The  term  summative  evalua¬ 
tion  was  also  coined  by  Scriven  [13]  for  use  in  the  in¬ 
structional  design  field.  As  with  formative  evaluation, 
human-computer  interaction  experts  (e.g.,  [16])  and 
usability  engineers  have  applied  the  theory  and  practice 
of  summative  evaluation  to  interaction  design  with 
very  successful  results. 
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Our  experiences  have  found  that  the  empirical 
comparative  approach  employing  representative  users, 
instantiated  in  the  summative  evaluation  process,  is 
very  effective  for  analyzing  strengths  and  weaknesses 
of  various  well-formed,  candidate  designs  set  within 
appropriate  user  scenarios.  However,  it  is  the  most 
costly  type  of  evaluation  because  it  needs  large  num¬ 
bers  of  users  to  achieve  statistical  validity  and  reliabil¬ 
ity,  and  because  data  analysis  can  be  complex  and  chal¬ 
lenging.  Unlike  the  other  three  types  of  evaluation  we 
present,  results  of  this  type  of  evaluation  typically  are 
applicable  across  a  variety  of  different  applications, 
since  they  give  comparative  outcomes  for  different 
kinds  of  user  interface  components,  features,  and/or 
interaction  techniques  spanning  a  number  of  diverse 
user  interfaces. 

3.  Development  of  the  Battlefield  Aug¬ 
mented  Reality  System  (BARS) 

3.1.  Overview  of  BARS 

Urban  terrain  is  one  of  the  most  important  and 
challenging  environments  for  current  and  future  peace¬ 
keepers  and  warfighters.  Because  of  the  increased 
concentration  of  military  operations  in  urban  areas, 
many  future  police  and  military  operations  will  occur 
in  cities.  However,  urban  terrain  is  also  one  of  the 
most  demanding  environments,  with  complicated 
three-dimensional  infrastructure  potentially  harboring 
many  types  of  risks  [2], 

A  team  of  researchers  from  the  Naval  Research 
Laboratory  and  Virginia  Tech  are  developing  the  Bat¬ 
tlefield  Augmented  Reality  System  (BARS)  [5,  8,  10] 
to  mitigate  these  warfighting  difficulties  through  the 
use  of  outdoor,  mobile  augmented  reality.  Augmented 
reality  is  a  display  paradigm  that  mixes  computer¬ 
generated  graphics  with  a  user's  view  of  the  real  world. 
An  example  is  shown  Figure  2.  The  user  wears  a  see- 
through  head-mounted  display  that  the  system  tracks  in 
six-degree-of-freedom  space  (position  and  orientation). 
Computer  graphics  and/or  text  are  created  and  aligned 
from  the  user's  perspective  with  the  objects  to  be  aug¬ 
mented.  By  providing  direct,  heads-up  access  to  in¬ 
formation  correlated  with  a  user’s  view  of  the  real 
world,  mobile  augmented  reality  has  the  potential  to 
recast  the  way  information  is  presented  to  and  accessed 
by  a  user. 

A  user  wearing  BARS  is  shown  in  Figure  3.  Note 
the  head-mounted  display,  which  is  where  a  user  sees 
the  augmented  graphics  view  (such  as  in  Figure  2), 
dynamically  changing  as  the  user  moves  around. 


Figure  2.  An  example  of  augmented  reality  (AR), 
where  graphical  information  overlays  a  user’s  view  of 
the  real  world.  A  compass  shows  which  direction  the 
user  is  facing,  the  triangles  indicate  a  path  the  user  is 
following,  a  hidden  chemical  hazard  is  annotated,  and 
the  name  of  the  street  is  given.  Graphics  are  registered 
with  the  real  world,  so,  for  example,  triangles  appear 
to  be  painted  onto  the  road  surface.  The  result  is  an 
integrated  display  that  allows  heads-up  viewing  of  the 
augmenting  graphical  information. 


Figure  3.  User  wearing  BARS  equipment. 


Mobile  augmented  reality  has  many  research  chal¬ 
lenges  related  to  the  design  of  the  user  interface,  one  of 
which  is  the  “Superman  X-ray  vision  problem”  [12], 
illustrated  later  in  Figure  5.  This  problem  encapsulates 
the  fundamental  advantages  and  disadvantages  of  mo¬ 
bile  augmented  reality.  With  such  a  system,  a  user  has 
“X-ray”  vision  and  can  “see”  non-visible  objects  (e.g., 
far-field  objects  that  are  occluded  by  near-field  objects) 
and  information  about  them.  We  have  determined  that 
this  is  a  core  scientific  issue  in  AR  (at  least  for  urban 
military  settings,  our  current  application  context),  and 
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are  studying  how  best  to  present  these  non-visible,  oc¬ 
cluded  objects  to  the  user.  This  very  challenging  prob¬ 
lem  in  AR  user  interface  design  occurs  because  the 
occlusion  cues  must  be  artificially  created  with  graph¬ 
ics  in  order  to  support  natural  human  depth  perception. 
Perceiving  both  relative  and  absolute  depth  is  a  critical 
task  in  military  (and  many  other)  situations,  for  a  user 
to  quickly  identify  and  correctly  perceive  an  object’s  or 
several  objects’  positions.  For  example,  a  dismounted 
warrior  might  want  to  know  whether  an  friendly  tank 
or  squad  is  located  between  two  specific  buildings  that 
the  warrior  cannot  see  but  is  currently  targeting  for  fire 
(i.e.,  they  are  behind  buildings  the  warrior  can  see). 

3.2.  BARS  usability  engineering  plan 

Figure  4  shows  our  plan  for  usability  engineering  ac¬ 
tivities  for  BARS  user  interface  development,  and  indi¬ 
cates  how  all  activities  are  interrelated.  Specifically, 
results  from  one  activity  inform  the  subsequent  activ¬ 
ity.  This  plan  is  an  instantiation  of  activities  from  Fig¬ 
ure  1,  addressing  both  design  and  evaluation  of  the 
BARS  user  interface.  It  allows  us  to  iteratively  im¬ 
prove  the  BARS  user  interface  by  a  combination  of 
techniques.  This  approach  is  based  on  sequentially 
performing  a  domain  analysis,  then  an  expert  evalua¬ 
tion,  followed  by  user-based  statistical  and  formative 
evaluations,  with  iteration  as  appropriate  within  and 
among  each  type  of  evaluation.  This  plan  leverages  the 
results  of  each  individual  method  by  systematically 
defining  and  refining  the  BARS  user  interface  in  a 
cost-effective  progression. 


Figure  4.  BARS  usability  engineering  plan. 

4.  Usability  evaluation  activities  for  BARS 


Team  members  participating  in  usability  engineer¬ 
ing  activities  for  BARS  include  personnel  from  the 
Naval  Research  Laboratory  (software  and  system  de¬ 
velopers  and  user  interface  design  experts),  Virginia 
Tech  (usability  engineers),  Columbia  University  (AR 


user  interface  design  expert),  and  a  USMCR  Captain, 
who  served  the  critical  role  of  subject  matter  expert. 
During  usability  engineering  activities  prior  to  evalua¬ 
tion,  such  as  domain  analysis,  we  created  a  specific 
scenario  for  BARS,  to  represent  a  realistic  and  signifi¬ 
cant  warfighting  task  situation  in  an  urban  setting  [5], 
Then  we  analyzed  the  scenario  to  produce  user- 
centered  requirements. 

Interestingly,  producing  the  user-centered  re¬ 
quirements  drove  an  important  design  decision.  We 
realized  that  our  user-centered  requirements  identified 
a  list  of  features  that  could  not  be  easily  delivered  by 
any  current  AR  system.  For  example,  one  BARS  user- 
centered  requirement  said  that  the  system  must  be  able 
to  display  the  location  of  hidden  and  occluded  objects 
(e.g.,  personnel  or  vehicles  located  somewhere  behind 
a  visible  building).  This  raised  numerous  user  interface 
design  questions  related  to  occluded  objects  and  how 
they  should  be  presented  graphically  to  a  user  (the  ‘X- 
ray  vision'  problem  mentioned  in  Section  3).  To  ad¬ 
dress  such  issues,  we  began  expert  evaluations  on  an 
evolving  BARS  user  interface  design. 

4.1.  BARS  expert  usability  evaluation 

During  six  cycles  of  expert  evaluation  over  a  two 
month  period,  summarized  in  Table  1,  we  designed 
approximately  100  mockups  depicting  various  potential 
designs  for  representing  occlusion,  systematically  vary¬ 
ing  drawing  parameters  such  as: 

•  Lines :  intensity,  style,  thickness 

•  Shading :  intensity,  style,  fill,  transparency 

•  Hybrid  techniques  employing  combinations  of 
lines  and  shadings 

We  were  specifically  examining  several  aspects  of 
occlusion,  including  how  best  to  visually  represent 
occluded  information  and  objects,  the  number  of  dis- 
criminable  levels  (layers)  of  occlusion,  and  variations 
on  the  drawing  parameters  listed  previously.  In  each 
cycle  of  expert  evaluation,  team  members  individually 
examined  a  set  of  occlusion  representations  (set  size 
ranged  from  5  to  30  mockups  in  a  cycle),  which  were 
created  using  Adobe  Photoshop  and  Microsoft  Power¬ 
Point  employing  video  to  capture  real-world  scenes  as 
background  images.  Team  members  each  independ¬ 
ently  performed  an  expert  evaluation  of  electronically 
shared  mockups  in  advance  of  extensive  teleconfer¬ 
ence  calls.  During  the  calls,  we  shared  our  individual 
expert  evaluation  results,  compiled  our  assessments, 
and  collaboratively  determined  how  to  design  the  next 
set  of  mockup  representations,  informed  by  results  of 
the  current  cycle.  Because  the  mockups  supported  a 
very  quick  turn-around,  we  were  able  to  evaluate  many 
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Table  1.  Summary  of  expert  evaluations  to  evolve  BARS  user  interface  designs  for  occlusion. 


Cycle  No. 

Purpose  of  this 
Evaluation  Cycle 

Medium  for  this 
Evaluation  Cycle 

Results  /  Findings 

1 

Initial  expert  evaluation  and 
overview  of  BARS 

BARS  system 

•  Focus  usability  engineering  efforts 
on 

1 .  tracking  and  registration, 

2.  occlusion,  and. 

3.  distance  estimation. 

2 

First  cut  at  representing 
occlusion  in  MOUT  (mili¬ 
tary  operations  in  urban 
terrain) 

5  interface  mockups  includ¬ 
ing  line-based  building 
outlines  and  personnel 
representations 

•  Tracking  study  will  require  time  to 
build  cage  (see  Figure  6);  focus  on 
occlusion  in  the  interim. 

3 

Examine  large  set  of  mock- 
ups  that  redundantly  encode 
occlusion  using  various  line 
drawing  attributes 

25  interface  mockups  sys¬ 
tematically  varying  dif¬ 
ferent  types  of  line  width, 
intensity,  and  style 

•  Line  intensity  and  thickness  appear 
to  be  the  most  powerful  (consis¬ 
tently  recognizable)  encoding 
mechanisms,  followed  by  line 
style. 

•  Color  and  intensity  of  the  scene  can 
create  misleading  cues  when  using 
color  and  intensity  together  as  an 
encoding  scheme. 

4 

Continue  to  examine  previ¬ 
ous  set  of  occlusion  repre¬ 
sentations 

25  interface  mockups  sys¬ 
tematically  varying  dif¬ 
ferent  types  of  line  width, 
intensity,  and  style 

•  Number  of  occluded  layers  that  can 
be  discriminably  (effectively)  rep¬ 
resented  by  line-based  encoding  is 
three  or  four. 

5 

Examine  additional  visual 
cues  to  aid  in  distance  esti¬ 
mation;  examine  use  of 
filled  polygons  to  represent 
occlusion  in  interior  spaces 

14  interface  mockups  using 
various  shadings  of  oc¬ 
cluded  objects  to  show 
distance  as  well  as  occlu¬ 
sion  in  interior  spaces 

•  Distance  cues  should  be  overlaid 
onto  the  ground  and  should  be  eas¬ 
ily  turned  off  and  on  by  the  user. 

•  Motion  parallax  may  help  resolve 
some  problems. 

•  Number  of  occluded  layers  that  can 
be  discriminably  (effectively)  rep¬ 
resented  by  shading-based  encod¬ 
ing  is  three  or  four. 

6 

Examine  shaded  polygonal 
representations  in  a  com¬ 
plex  outdoor  environment 
(Columbia  campus),  as  well 
as  hybrid  designs  employ¬ 
ing  lines;  examine  effects  of 
motion  parallax  on  encod¬ 
ings 

30  interface  mockups  (5 
mockups  per  set,  6  sets) 
systematically  varying 

representations  of  occlu¬ 
sions  employing  filled 
(shaded)  polygons,  trans¬ 
parency,  and  lines. 

Mockups  also  simulated 
motion  parallax  by  paging 
between  images  in  a  set. 

•  A  combination  of  shaded  polygons 
and  line  width  is  the  most  powerful 
encoding. 

•  Distance  encoding  may  be  more 
powerful  than  simple  occlusion. 

•  Users  should  be  able  to  push  and 
pull  the  three  to  four  levels  of  rep¬ 
resentation  into  and  out  of  their 
real-world  scene. 

more  designs  than  could  have  been  implemented  “live” 
in  BARS.  In  fact,  this  use  of  mockups  was  extremely 
cost-effective,  allowing  the  team  to  begin  substantive 
usability  evaluation  work  even  before  many  BARS 
features  were  implemented. 

Cycle  1  (see  Table  1)  served  to  indicate  that,  in 
fact,  the  mockups  were  an  effective  way  of  performing 
expert  evaluations.  In  cycles  2  through  4,  we  specifi¬ 
cally  studied  line-based  encodings,  and  our  results 


showed  that  line  intensity  appeared  to  be  the  most 
powerful  (i.e.,  consistently  recognizable)  line-only 
drawing  parameter,  followed  by  line  style.  Further, 
line-based  representations  were  discriminable  at  only 
three  or  four  levels  of  occlusion.  Interestingly,  we 
found  a  few  instances  when  color  and  intensity  created 
misleading  cues  when  used  in  combination  as  the  en¬ 
coding  scheme.  In  cycle  5,  we  studied  distance  estima¬ 
tion  and  shading-based  representations.  Results  indi- 
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cated  that  shading  alone  may  not  be  enough  to  indicate 
distances;  user-controllable  overlaying  of  distance  cues 
onto  the  ground  may  be  necessary.  Again  we  found 
that  shading-based  representations  were  also  dis- 
criminable  at  only  three  of  four  levels  of  occlusion.  In 
cycle  6,  we  combined  both  line-  and  shading-based 
representations  into  some  hybrid  designs,  hoping  to 
maximize  the  best  characteristics  of  each  type  of  repre¬ 
sentation.  In  particular,  we  found  that  a  hybrid  of 
shaded  regions  and  line  width,  both  with  varying  inten¬ 
sity,  appeared  to  be  the  most  powerful,  discriminable 
representation  for  representing  occluded  objects. 

Further,  at  this  point,  based  on  the  relatively  small 
changes  we  were  making  to  the  mockups,  we  felt  we 
had  iterated  to  an  optimal  set  of  representations  for 
occlusion,  so  we  chose  to  move  on  to  formative  evalua¬ 
tions  using  them.  However,  in  retrospect  (and  as  part 
of  continually  evolving  and  improving  our  cost- 
effective  progression  of  usability  evaluation  -  see  Sec¬ 
tion  5),  we  realized  it  would  have  been  scientifically 
advantageous  to  have  run  the  user-based  statistical 
evaluations  next,  to  evolve  empirically-derived  user 
interface  designs  for  our  BARS  formative  evaluation. 
So,  even  though  we  did  not  perform  them  until  after 
formative  evaluations  on  BARS,  we  will  discuss  the 
user-based  statistical  evaluations  next. 

4.2.  BARS  user-based  statistical  evaluation 

Our  prior  evaluations  of  BARS  led  us  logically  to 
critical  design  factors,  in  this  case  graphical  techniques 
for  displaying  the  ordering  and  distance  of  occluded 
objects,  that  needed  statistical,  empirical  confirmation 
with  users.  Specifically,  we  determined  from  our  re¬ 
sults  that  a  critical  yet  tenable  set  of  factors  and  their 
values  for  a  user-based  statistical  evaluation  were: 

•  Drawing  style  -  line,  filled,  line+fill  (shading) 

•  Opacity  -  constant,  increasing  with  levels  of  oc¬ 
clusion 

•  Intensity  -  constant,  decreasing  with  levels  of  oc¬ 
clusion 

•  Ground  plane  -  on,  off 

Our  reasoning  behind  choices  for  each  factor  is  de¬ 
tailed  in  [9].  The  study  was  run  with  eight  subjects, 
who  saw  a  small  virtual  world  that  consisted  of  repre¬ 
sentations  of  three  blue  buildings  and  a  red  target  ob¬ 
ject,  overlaid,  of  course,  on  the  real  world.  A  display 
from  one  of  the  evaluation  trials  is  shown  in  Figure  5. 

The  user’s  task  was  to  indicate  the  location  of  the 
target  (near,  middle,  or  far  position)  as  it  moved  among 
buildings  from  trial  to  trial.  We  examined  time  to  per¬ 
form  tasks  as  well  as  task  accuracy  under  various  ex¬ 


perimental  conditions.  Our  results  from  this  evaluation 
are  reported  in  full  in  [9].  To  briefly  summarize,  sub¬ 
jects  made  79%  correct  choices  and  21%  erroneous 
choices  of  the  target  location  during  trials.  User  errors 
fell  into  two  categories:  the  target  could  be  closer  than 
the  user’s  answer,  or  farther  than  the  user’s  answer. 
Subjects  were  most  accurate  when  the  target  was  in  the 
far  position;  only  17.3%  of  their  erroneous  choices 
were  made  when  the  target  was  in  the  far  position,  as 
compared  to  38.6%  in  the  close  position,  and  44.2%  in 
the  middle  position.  Other  findings  indicate  that  the 
‘line+fill’  drawing  style  yielded  the  best  accuracy,  con¬ 
firming  our  expert  evaluation  results. 


Figure  5.  An  example  of  a  BARS  user's  view  of  real- 
world  buildings  augmented  with  overlaid  graphics  to 
indicate  occluded  (hidden)  buildings.  The  overlaid 
information  can  contain  text,  bitmaps,  or  any 
computer-generated  visual  data.  In  this  example,  the 
lighter  the  shading  of  the  object,  the  further  away  it  is. 

Overall,  our  results  indicate  that  we  evolved  an  ef¬ 
fective  and  efficient  set  of  graphical  representations  for 
occlusion,  by  applying  our  usability  engineering  meth¬ 
odology.  These  representations  are  being  incorporated 
into  the  BARS  user  interface.  Further,  once  the  larger 
BARS  user  interface  setting  is  adequately  expanded 
and  refined  (i.e.,  using  iterative  expert  evaluation  and 
additional  user-based  statistical  studies  to  refine  other 
atomic  user  interface  components  and  features),  we 
expect  to  conduct  further  usability  evaluations  that 
employ  comprehensive  user  tasks  and  task  flows.  Ad¬ 
ditional  user-based  statistical  evaluations  may,  for  ex¬ 
ample,  study  other  core  scientific  issues  in  AR  such  as 
acceptable  registration  error  (how  far  off  the  augment¬ 
ing  graphics  can  be  from  the  real-world  object)  in 
terms  of  user  performance  and  (much  like  occlusion) 
what  visual  representations  best  support  distance  per¬ 
ception  and  estimation  for  a  user. 
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4.3.  BARS  formative  usability  evaluation 

Continuing  with  our  study  of  occlusion,  we  created 
a  formal  set  of  user  tasks,  and  had  five  individual  sub¬ 
jects  perform  the  set  of  tasks.  Three  subjects  were  Ma¬ 
rines  and  two  were  user  interface/ AR  experts.  The 
tasks  were  militarily  relevant,  inspired  by  our  urban 
warfighting  scenario.  In  the  tasks,  users  were  asked  to 
find  explicit  information  from  the  augmenting  graphics 
that  they  could  see.  Some  simple  examples  included 
answering  questions  such  as: 

•  Which  enemy  platoon  is  nearest  you? 

•  Where  are  restricted  fire  areas?  Where  are  other 
friendly  forces? 

•  Estimate  the  distance  between  the  enemy  squad 
and  yourself. 

•  What  direction  is  the  enemy  tank  traveling? 

Having  anticipated  the  challenge  of  working  in  an 
outdoor,  mobile,  highly  dynamic  environment,  team 
members  had  to  consider  innovative  approaches  to 
usability  evaluation.  Our  solution  was  to  design  and 
build  a  specially-constructed  motion  tracking  cage  so 
that  BARS  could  accurately  track  the  user  and  accu¬ 
rately  register  graphics  onto  the  real  world.  The  cage 
provided  a  mounting  platform  for  Intersense  IS900 
tracking  rails,  which  are  currently  in  common  use  for 
AR  tracking.  While  clearly  not  usable  in  a  final, 
fielded  outdoor  AR  system,  mounting  the  tracking  rails 
on  top  of  the  cage  gave  us  adequate  tracking  perform¬ 
ance  to  meet  our  user  task  requirements,  without  wait¬ 
ing  for  completion  of  a  totally  mobile  outdoor  proto¬ 
type  AR  tracking  system  with  the  required  perform¬ 
ance.  The  main  tradeoff  was  that  the  user  was  not  able 
to  freely  walk  large  distances,  as  envisioned  in  the  final 
BARS.  We  therefore  focused  on  tasks  related  to  scan¬ 
ning  the  urban  environment  from  the  area  covered  by 
the  tracking  cage.  Our  setup  also  included  auxiliary 
evaluator’s  monitors  to  provide  evaluators  an  accurate 
display  of  a  user’s  view.  Our  outdoor  BARS  evalua¬ 
tion  equipment  setup  is  shown  in  Figure  6. 

Our  overall  formative  evaluation  results  showed 
that  users  performed  approximately  85%  of  the  tasks 
correctly  and  efficiently  with  less  than  10  minutes  of 
training  using  BARS.  Users  liked  having  multiple 
views  of  various  graphical  augmentations,  and  liked 
being  able  to  develop  strategies  to  manipulate  the  scene 
and  understand  how  BARS  works.  They  stated  that 
they  were  able  to  gain  situation  awareness  from  using 
BARS.  Users  disliked  use  of  wireframes  (lines)  as  the 
main  augmentation  representation,  saying  that  it  made 
the  scene  too  cluttered.  They  also  disliked  some  of  the 
controls  for  manipulating  augmentations  (e.g.,  making 


them  appear/disappear),  but  these  controls  are  tempo¬ 
rary,  only  for  our  evaluation  studies,  and  are  not  in¬ 
tended  to  be  included  in  a  deployable  BARS.  Many  of 
our  results  supported  findings  from  our  earlier  expert 
evaluations,  such  as  that  objects  must  be  perceived  as 
three-dimensional  and  our  hypothesis  that  no  more  than 
three  or  four  levels  of  occlusion  are  discriminable.  We 
made  new  findings  such  as  the  fact  that  three- 
dimensionality  of  occluded  objects  was  easier  to  per¬ 
ceive  in  shaded  objects  than  in  line-drawn  objects.  All 
users  had  a  very  positive,  enthusiastic  reaction  to 
BARS  and  its  capabilities.  Our  experience  during  the 
formative  evaluation  led  us  to  determine  that  the  prob¬ 
lem  of  representing  occluded  objects  in  AR  required 
more  attention,  and  specifically  required  us  to  design 
studies  to  determine  what  visual  design  factors  (for 
occluded  objects)  were  most  effective,  independent  of 
other  user  interface  components  (e.g.,  text  labels). 


Figure  6.  Outdoor  tracking  cage  setup  for  BARS 
formative  evaluation  study.  The  cage  has  overhead 
tracking  rails  (barely  visible  under  the  blue  canopy)  so 
that  the  augmenting  graphics  can  change  as  a  user 
moves  around. 

4.4.  BARS  summative  usability  evaluation 

We  are  still  performing  user-based  statistical 
evaluations  and  formative  evaluations  on  the  BARS 
user  interface.  There  is  still  much  work  to  be  done  on 
the  occlusion  issue,  as  well  as  a  variety  of  other  chal¬ 
lenges  including  tracking  /  registration  error  and  dis¬ 
tance  perception  /  estimation.  As  such,  we  have  not  yet 
conducted  comparative  summative  evaluations  of  the 
BARS  user  interface. 
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5.  Conclusions:  A  cost-effective  usability 
evaluation  progression 

As  depicted  in  Figure  7,  our  work  over  the  past 
several  years  has  shown  that  progressing  from  expert 
evaluation  to  user-based  statistical  evaluation  to  forma¬ 
tive  evaluation  to  summative  evaluation  is  an  efficient 
and  cost-effective  strategy  for  assessing  and  improving 
a  user  interface  design  [7]. 


Figure  7.  A  cost-effective  usability  evaluation 
progression. 

Our  expert  evaluations  of  BARS  identified  obvi¬ 
ous  usability  problems  or  missing  functionality  early  in 
the  BARS  development  life  cycle,  thus  allowing  im¬ 
provements  to  the  user  interface  prior  to  performing 
user-based  statistical  and  formative  evaluations.  If 
expert  evaluations  are  not  performed  prior  to  user- 
based  statistical  and  formative  evaluations,  these 
evaluations  will  typically  take  longer  and  require  more 
users,  and  yet  reveal  many  of  the  same  usability  prob¬ 
lems  that  could  have  been  discovered  by  less  expensive 
expert  evaluations.  In  cases  where  user  interface  de¬ 
sign  demands  that  new  metaphors,  interaction  tech¬ 
niques,  or  user  interface  components  be  created,  user- 
based  statistical  studies  are  an  efficient  method  for  de¬ 
termining  what  design  factors  are  most  critical  for  a 
particular  user  interface  component  or  feature.  These 
refined  components  can  then  be  migrated  into  a  mature 
user  interface  that  is  primed  for  formative  usability 
evaluation. 


Once  evolving  user  interface  designs  have  been 
expertly  and  formatively  evaluated,  then  experimenters 
can  have  confidence  that  those  designs  are  comparable 
in  terms  of  their  usability,  and  thus  lead  to  a  compel¬ 
ling  comparative  summative  study.  Otherwise,  the 
expensive  summative  evaluations  may  be  essentially 
comparing  “good  apples”  to  “bad  oranges”  [7].  Spe¬ 
cifically,  a  summative  study  of  different  application 
interfaces  may  be  comparing  one  design  that  is  inher¬ 
ently  better,  in  terms  of  usability,  than  the  other  ones. 
Developing  all  designs  used  in  a  summative  study  fol¬ 
lowing  our  suggested  progression  of  usability  engi¬ 
neering  activities  should  lead  to  a  more  valid  compari¬ 
son.  Moreover,  in  our  BARS  work,  we  found  that  re¬ 
sults  from  our  user-based  statistical  studies  are  effi¬ 
ciently  driving  user  interface  design  for  our  formative 
evaluations.  We  further  expect  the  formative  evalua¬ 
tions,  in  turn,  to  inform  the  design  of  summative  stud¬ 
ies  by  helping  determine  critical  usability  characteris¬ 
tics  to  evaluate  and  compare. 

While  this  paper  reports  only  on  our  usability  en¬ 
gineering  activities  with  BARS,  we  have  been  involved 
with  and  led  these  activities  for  a  broad  variety  of  ap¬ 
plications  over  the  past  two  decades  (e.g.,  [4,  7,  9,  10]). 
A  continual  and  overarching  goal  of  all  our  usability 
engineering  work  is  to  develop,  apply,  and  extend 
methods  for  improving  the  usability  of  interactive 
software  applications.  In  particular,  we  have  focused 
on  developing,  applying,  and  extending  when  neces¬ 
sary,  high-impact  processes  for  evaluating  usability. 
Our  work  has  produced  a  cost-effective  progression  of 
usability  engineering  activities. 
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